perm filename PATREC[4,KMC]5 blob
sn#091368 filedate 1974-03-08 generic text, type T, neo UTF8
00100
00200
00300 HEURISTIC PATTERN MATCHING FOR THE RECOGNITION OF
00400 NATURAL LANGUAGE DIALOGUE EXPRESSIONS
00500
00600 Kenneth Mark Colby
00700 and
00800 Roger C. Parkison
00900
01000
01100 INTRODUCTION
01200
01300 To recognize something is to identify it as an instance of
01400 the "same again". This familiarity is possible because of recurrent
01500 characteristics of the world which repeat themselves over and over
01600 again. We shall describe an algorithm which recognizes recurrent
01700 characteristics of natural language dialogue expressions through a
01800 multi-stage sequence of functions which progressively transforms
01900 these input expressions into a pattern which eventually best matches
02000 a more abstract stored pattern. The name of the stored pattern has a
02100 pointer to the name of a response function which decides what to do
02200 once the input has been characterized. Here we shall discuss
02300 only the recognizing functions, except for one response function
02400 (anaphoric substitution) which interactively aids the
02500 characterization process. How the response functions operate will be
02600 described in a future communication.
02700 In constructing and testing a simulation of paranoid
02800 processes, we were faced with the problem of reproducing paranoid
02900 linguistic behavior in a diagnostic psychiatric interview. The
03000 diagnosis of paranoid states, reactions or modes is made by
03100 clinicians who judge a degree of correspondence between what they
03200 observe linguistically in an interview and their conceptual model of
03300 paranoid behavior. There exists a high degree of agreement among
03400 psychiatrists about this conceptual model which relies mainly on what
03500 an interviewee says and how he says it.
03600 Natural language is a life-expressing code people use for
03700 communication with themselves and others. In a real-life dialogue
03800 such as a psychiatric interview, the participants have interests,
03900 intentions, and expectations which are revealed in their linguistic
04000 expressions. To produce effects on an interviewer which he would
04100 judge similar to the effects produced by a paranoid patient , an
04200 interactive simulation of a paranoid patient must be able to
04300 demonstrate typical paranoid interview behavior. To achieve the
04400 desired effects, the paranoid model must have the ability to deal
04500 with the linguistic behavior of the interviewer.
04600 There are a number of approaches one might consider for an
04700 ideal handling of dialogue expressions. One approach would be
04800 to construct a dictionary of all expressions which could possibly
04900 arise in an interview. Associated with each expression would be its
05000 interpretation, depending on dialogue context, which in turn must
05100 somehow be defined. For obvious reasons, no one takes this approach
05200 seriously. Instead of an expression dictionary, one might
05300 construct a word dictionary and then use projection rules to yield an
05400 interpretation of a sentence from the dictionary definitions. This
05500 has been the approach adopted by most linguistic parsers based on
05600 transformational or systemic grammars. Such a method performs
05700 adequately as long as the dictionary involves only a few hundred
05800 words, each word having only one or two senses, and the dialogue is
05900 limited to a mini-world of a few objects and relations. But the
06000 problems which arise in a real-time psychiatric interview conducted
06100 in unrestricted English are too great for this method to be useful in
06200 actual dialogues in which immediacy of response is an important
06300 requirement.
06400 There is little consensus knowledge about how humans process
06500 natural language. They can be shown to possess some knowledge of
06600 grammar rules but this does not entail that they use a grammar in
06700 interpreting and producing language. Irregular verb-tenses and
06800 noun-plurals do not follow rules; yet people use thousands of them.
06900 One school of linguistics believes that people possess full
07000 transformational grammars for processing language. In our view this
07100 position seems dubious. Language is what is recognized but the
07200 processes involved may not be at all linguistic. Originally
07300 transformational grammars were not designed to "understand" a large
07400 subset of English; they represented a set of axioms for deciding
07500 whether a string is "grammatical". Efforts to use them for other
07600 purposes have not been very fruitful.
07700 An analysis of what one's problem actually is should guide
07800 the selection or invention of methods appropriate to that problem's
07900 solution. Our problem was not to develop a consistent and general
08000 theory of language nor to assert empirically testable hypotheses
08100 about how people process language. Our problem was to design an
08200 algorithm which recognizes what is being said in a dialogue and what
08300 is being said about it in order to make a response such that a sample
08400 of I-O pairs from the paranoid model is judged similar to a sample of
08500 I-O pairs from paranoid patients. From the perspective of
08600 artificial intelligence as an attempt to get computers to perform
08700 human intellectual tasks, new methods had to be devised for the task
08800 of participating in a human dialogue in a paranoid-patient-like way.
08900 We are not making an existence claim that our strategy represents the
09000 way people process language. We sought effective methods which
09100 could operate efficiently in real time. We would not deny that our
09200 methods for this task are possibly analogous to the methods humans
09300 use. And since our methods provide a general way of many-to one
09400 mapping from "surface" expressions to a single stored pattern, these
09500 methods can be used by any type of "host" system which takes natural
09600 language as input.
09700 To perform the task of managing communicative uses and
09800 effects of natural language, we adopted a strategy of transforming
09900 the input until a pattern is achieved which matches completely or
10000 partially a more abstract stored pattern. This strategy has proved
10100 adequate for our purposes a satisfactory percentage of the time. (No
10200 one expects any method to be successful 100% of the time since not
10300 even humans, the best natural language processors around, achieve
10400 this level of performance). The power of this method for natural
10500 language dialogues lies in its ability to ignore unrecognizable
10600 expressions and irrelevant details. A conventional parser doing
10700 word-by-word, parts-of-speech analysis fails when it cannot find one
10800 or more of the input words in its dictionary. It is too fragile for a
10900 dialogue in that it must know every word.
11000 In early versions of the paranoid model, (PARRY1), many (but
11100 not all) of the pattern recognition mechanisms were weak because they
11200 allowed the elements of the pattern to be order independent. (Colby,
11300 Weber, and Hilf, 1971). For example, consider the following
11400 expressions:
11500 (1) WHERE DO YOU WORK?
11600 (2) WHAT SORT OF WORK DO YOU DO?
11700 (3) WHAT IS YOUR OCCUPATION?
11800 (4) WHAT DO YOU DO FOR A LIVING?
11900 (5) WHERE ARE YOU EMPLOYED?
12000 In PARRY1 a procedure would scan these expressions looking for an
12100 information-bearing contentive such as "work", "for a living", etc.
12200 If it found such a contentive along with a "you" or "your" in the
12300 expression, regardless of word order, it would respond to the
12400 expression as if it were a question about the nature of one's work.
12500 (There is some doubt this even qualifies as a pattern since
12600 interrelations between words are ignored and only their presence is
12700 considered). An insensitivity to word order has the advantage that
12800 lexical items representing different parts of speech can represent
12900 the same concept,e.g. "work" as noun or as verb. But a price is paid
13000 for this resilience and elasticity. We found from experience that,
13100 since English relies heavily on word order to convey the meaning of
13200 it messages, the penalty of misunderstanding (to be distinguished
13300 from ununderdstanding), was too great. Hence in PARRY2 , as will be
13400 described shortly, all the patterns require a specified word order.
13500 It seems agreed that for high-complexity problems it is
13600 useful to have constraints. Diagnostic psychiatric interviews
13700 (and especially those conducted over teletypes) have several natural
13800 constraints. First, clinicians are trained to ask certain questions
13900 in certain ways. These stereotypes can be treated as special cases.
14000 Second, only a few hundred standard topics are brought up by
14100 interviewers who are trained to use everyday expressions and
14200 especially those used by the patient himself. When the interview is
14300 conducted over teletypes, expressions tend to be shortened since the
14400 interviewer tries to increase the information transmission rate over
14500 the slow channel of a teletype. (It is said that short expressions
14600 are more grammatical but think about the phrase "Now now, there
14700 there.") Finally, teletyped interviews represent written speech.
14800 Speech is known to be 50% redundant; hence many unrecognized words
14900 can be ignored without losing the meaning of the message. Written
15000 speech is loaded with idioms, cliches, pat phrases, etc. - all
15100 being easy prey for a pattern-recognition approach. It is
15200 time-wasting and usually futile to try to decode an idiom by
15300 analyzing the meanings of its individual words.
15400 We shall now describe the pattern recognition functions of the
15500 algorithm in some detail.
15600
15700 PREPROCESSING
15800
15900 Each word in the input expression is first looked up in a
16000 dictionary of 1600 (currently) entries which, for the sake of speed,
16100 is maintained in core during run-time. The dictionary, which was
16200 built empirically from thousands of teletyped interviews with
16300 previous versions of the model, consists of words, groups of words,
16400 and names of word-classes they can be translated into. The size of
16500 the dictionary is determined by the patterns, i.e. each dictionary
16600 word appears in one or more patterns. Entries in the dictionary
16700 reflect PARRY2's main interests. If a word in the input is not in the
16800 dictionary, it is dropped from the pattern being formed. Thus if the
16900 input were:
17000 WHAT IS YOUR CURRENT OCCUPATION?
17100 and the word "current" is not in the dictionary, the pattern at this
17200 stage becomes:
17300 ( WHAT IS YOUR OCCUPATION )
17400 The question-mark is thrown away as redundant since questions are
17500 recognized by word order. (A statement followed by a question mark
17600 (YOU GAMBLE?) is considered to be communicatively equivalent in its
17700 effects to that statement followed by a period.) Synonymic
17800 translations of words are made so that the pattern becomes, for
17900 example:
18000 ( WHAT BE YOU JOB )
18100 Groups of words are translated into a single word-class name, so
18200 that, for example, "for a living" becomes "job".
18300 Misspellings are also handled in the dictionary by simply
18400 rewriting a recognized misspelling into its correct form.Thus "yyou"
18500 becomes "you". The common misspellings were gathered from over 4000
18600 interviews with earlier versions of the paranoid model. Other
18700 misspellings do not appear in the pattern simply because they are not
18800 represented in the dictionary.
18900 Certain juxtaposed words are contracted into a single
19000 word,e.g. "GET ALONG WITH" becomes "GETALONGWITH". This is done to
19100 deal with groups of words which are represented as a single element
19200 in the stored pattern thereby preventing segmentation from occurring
19300 at the wrong places, such as at a preposition inside an idiom.
19400 Besides these contractions, certain expansions are made so that for
19500 example, "DON'T" becomes "DO NOT" and "I'D" becomes "I WOULD".
19700 SEGMENTING
19800
19900 Another weakness in the crude pattern matching of PARRY1 was
20000 that it took the entire input expression as its basic processing
20100 unit. Hence if only two words were recognized in an eight word input,
20200 the risk of misunderstanding was great. We needed a way of dealing
20300 with units shorter than the entire input expression.
20400 Expert telegraphists stay six to twelve words behind a
20500 received message before transcribing it. Translators wait until they
20600 have heard 4-6 words before they begin translating. Aided by a
20700 heuristic from work in machine-translation (Wilks, 1973 ), we devised
20800 a way of bracketing the pattern constructed up to this point into
20900 shorter segments using prepositions, wh-forms, certain verbs, etc. as
21000 bracketing points. (A complete listing of bracketing terms will be
21100 sent upon request). The new pattern formed is termed either "simple",
21200 having no delimiters within it, or "compound", i.e.being made up of
21300 two or more simple patterns. A simple pattern might be:
21400 ( WHAT BE YOU JOB )
21500 whereas a compound pattern would be:
21600 (( WHY BE YOU ) ( IN HOSPITAL ))
21700 Our experience with this method of segmentation shows that compound
21800 patterns from teletyped psychiatric dialogues rarely consist of more
21900 than three or four fragments.
22000 After certain verbs ("THINK", "FEEL",etc) a bracketing occurs
22100 to replace the commonly omitted "THAT", such that:
22200 ( I THINK YOU BE AFRAID )
22300 becomes
22400 (( I THINK ) ( YOU BE AFRAID ))
22500
22600 PREPARATION FOR MATCHING
22700
22800 Conjunctions serve only as markers for the segmenter and they
22900 are dropped out after segmentation.
23000 Negations are handled by extracting the "NOT" from the
23100 pattern and assigning a value to a global variable which indicates to
23200 the algorithm that the expression is negative in form. When a pattern
23300 is finally matched, this variable is consulted. Some patterns have a
23400 pointer to a pattern of opposite meaning if a "NOT" could reverse
23500 their meanings. If this pointer is present and a "NOT" is found,
23600 then the pattern matched is replaced by its opposite, e.g. ( I not
23700 trust you ) is replaced by the pattern ( I mistrust you ). We have
23800 not yet observed the troublesome case of "he gave me not one but two
23900 messages". (There is no need to scratch where it doesn't itch).
24000
24100 MATCHING AND RECYCLING
24200
24300 The algorithm now attempts to match the segmented patterns
24400 with stored patterns which currently number about 2000, 1600 being
24500 simple and 400 being compound. First a complete and perfect match is
24600 sought. When a match is found, the stored pattern name has a pointer
24700 to the name of a response function which decides what to do further.
24800 If a match is not found, further transformations of the pattern are
24900 carried out and a "fuzzy" match is tried.
25000 For fuzzy matching at this stage, we invented the heuristic
25100 of dropping elements in the pattern one at a time and attempting a
25200 match each time. This heuristic allows ignoring familiar words in
25300 unfamiliar contexts. For example, "well" is important in "Are you
25400 well?" but meaningless in "Well are you?".
25500 Deleting one element at a time results in, for example,the
25600 pattern:
25700 ( WHAT BE YOU MAIN PROBLEM )
25800 becoming successively:
25900 (a) ( BE YOU MAIN PROBLEM )
26000 (b) ( WHAT YOU MAIN PROBLEM )
26100 (c) ( WHAT BE MAIN PROBLEM )
26200 (d) ( WHAT BE YOU PROBLEM )
26300 (e) ( WHAT BE YOU MAIN )
26400 Since the stored pattern in this case matches (d), (e) would not be
26500 constructed. We found it unwise to delete more than one element
26600 since our segmentation method usually yields segments containing a
26700 small (1-4) number of words.
26800 Dropping an element at a time provides a probalility
26900 threshold for fuzzy matching which is a function of the length of
27000 the segment. If a segment consists of five elements, four of the five
27100 must be present in a particular order (with the fifth element missing
27200 in any position) for a match to occur. If a segment contains four
27300 elements, three must match - and so forth.
27400 The transformations described above result in a progressive
27500 coarsening of the patterns by deletion. Substitutions are also made
27600 in certain cases. Some patterns contain pronouns which could stand
27700 for a number of different things of importance to PARRY2. As we
27800 mentioned in the introduction, there is one case in which the
27900 response functions of memory aid in language recognition. One
28000 function of memory is to keep track of the context in order to give
28100 pronouns and other anaphoras a correct interpretation. For example,
28200 the pattern:
28300 ( DO YOU AVOID THEM )
28400 could refer to the Mafia, or racetracks, or other patients, depending
28500 on the context. When such a pattern is recognized, the pronoun is
28600 replaced by its current anaphoric value as determined by the response
28700 functions, and a more specific pattern such as:
28800 ( DO YOU AVOID MAFIA )
28900 is looked up. In many cases, the meaning of a pattern containing a
29000 pronoun is clear without any substitution. In the pattern:
29100 (( HOW DO THEY TREAT YOU ) ( IN HOSPITAL ))
29200 the meaning of THEY is clarified by ( IN HOSPITAL ).
29300
29400 COMPOUND-PATTERN MATCH
29500
29600 When more than one simple pattern is detected in the input, a
29700 second matching is attempted. The heuristic methods used are similar
29800 to the first matching except they occur at the segment level rather
29900 than at the single element level. Certain patterns, such as ( HELLO )
30000 and ( I THINK ), are dropped because they are considered meaningless.
30100 If a complete match is not found, then simple patterns are dropped,
30200 one at a time, from the complex pattern. This allows the input,
30300 (( HOW DO YOU COME ) ( TO BE ) ( IN HOSPITAL ))
30400 to match the stored pattern,
30500 (( HOW DO YOU COME ) ( IN HOSPITAL )).
30600
30700 If no match can be found at this point, the algorithm has
30800 arrived at a default condition and the appropriate response functions
30900 decide what to do. For example, in a default condition, the model
31000 may assume control of the interview, asking the interviewer a
31100 question, continuing with the topic under discussion or introducing a
31200 new topic.
31300 An example of interview dialgue is presented in the appendix
31400 showing the transformations the input expressions undergo and the
31500 patterns they match.
31600
31700 ADVANTAGES AND LIMITATIONS
31800
31900 As mentioned, one of the main advantages of a
32000 characterization strategy is that it can ignore as irrelevant what it
32100 does NOT recognize. There are several million words in English,
32200 each possessing one to over a hundred senses. To construct a
32300 machine-usable word dictionary of this magnitude is out of the
32400 question at this time. Recognition of natural language input such
32500 as described above, allows real-time interaction in a dialogue since
32600 it avoids becoming ensnarled in combinatorial disambiguations and
32700 long chains of inferencing which would slow a dialogue algorithm down
32800 to impracticality, if it could even function at all. The price paid
32900 for pattern matching is that sometimes, but rarely, ambiguities slip
33000 through.
33100 A drawback to PARRY1 was that it reacted to the first pattern
33200 it found in the input rather than characterizing the input as fully
33300 as possible and then deciding what to do based on a number of tests.
33400 Another practical difficulty with PARRY1 from a programmer's
33500 viewpoint, was that elements of the patterns were strung out in
33600 various procedures throughout the algorithm. It was often a
33700 considerable chore for the programmer to determine whether a given
33800 pattern was present and precisely where it was. In the
33900 above-described method, the patterns are all collected in one part of
34000 the data-base where they can easily be examined.
34100 Concentrating all the patterns in the data base gives PARRY2
34200 a limited "learning" ability. When an input fails to match any
34300 stored pattern or matches an incorrect one, as judged by a human
34400 operator, a pattern which matches the input can be put into the
34500 data-base automatically. If the new pattern has the same meaning as
34600 a previously stored pattern, the human operator must provide the name
34700 of the appropriate response function. If he doesn't remember the
34800 name, he may try to rephrase the input in a form recognizable to
34900 PARRY2 and it will name the response function associated with the
35000 rephrasing. These mechanisms are not "learning" in the commonly used
35100 sense but they do allow a person to transfer his knowledge into
35200 PARRY2's data-base with very little redundant effort.
35300 Informal observation thus far shows PARRY2's linguistic
35400 recognition abilities to be quite superior to PARRY1's. A more
35500 systematic and quantitative evaluation of performance will be carried
35600 out in the future.
35700 REFERENCES
35800
35900
36000 Colby, K.M., Weber, S., and Hilf, F.D. (1971). Artificial Paranoia.
36100 ARTIFICIAL INTELLIGENCE, 2, 1-25.
36200
36300 Wilks, Y. (1973). An artificial intelligence approach to machine
36400 translation. In COMPUTER MODELS OF THOUGHT AND
36500 LANGUAGE, Schank, R. and Colby, K.M., Eds., W.H. Freeman,
36600 San Francisco.